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Abstract 

In  this  paper,  Wei  bull  outlier  tests  based  on  three  different  statistics 
are  investigated  with  respect  to  their  power  optimality  under  various  alter¬ 
native  models.  Two  of  the  statistics  are  new  in  the  context  of  outlier 
statistics;  and  one  of  these  is  shown  to  provide  a  more  powerful  test  in  cer 
tain  situations  than  other  more  classical  outlier  test  statistics.  Critical 
values  of  the  three  statistics  were  computer-generated  and  are  tabulated. 

The  tabulated  values  allow  one  to  identify  "treatment  effects"  resulting 
from  unsuspected  modifications  to  a  process  or  to  predict  failure  times  in 
a  life  test.  Numerical  examples  are  given. 
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1 .  Introduction 

The  results  described  in  this  paper  pertain  to  the  detection  of  Weibull 
outliers  and  to  the  prediction  of  a  future  ordered  observation  in  an  ongoing 
life  test.  The  motivation  for  the  research  described  herein,  however,  is  the 
need  for  a  method  of  determining  whether  or  not,  in  a  retrospective  study, 
inordinately  long  times  to  failure  are  statistically  significant  and  thus 
possible  results  of  "treatment"  effects  caused  by  unsuspected  modifications  to 
a  process. 

Detection  of  outliers  (spurious  observations)  is  a  problem  that  has  long 
concerned  experimenters  and  data  analysts.  An  historical  survey  dealing  with 
outliers  was  given  as  early  as  1891  by  Czuber  [4],  A  more  up-to-date  expository 
review  of  methods  for  detection  of  spurious  observations  was  presented  by 
Grubbs  [10].  The  latter  paper  is  a  modification  of  one  "prepared  primarily  for 
the  American  Society  for  Testing  Materials  and  represents  a  rather  extensive 
revision  of  an  earlier  Tentative  Recommended  Practice  ...  ." 

Grubbs  points  out  that  "almost  all  criteria  for  outliers  are  based  on  an 
assumed  underlying  normal  (Gaussian)  population"  and  Anscombe  [1]  in  an  exten¬ 
sive  1960  survey  of  the  subject  of  outliers  makes  an  initial  assumption  of 
normality  for  the  data.  (Discussion  of  the  Anscombe  paper  and  a  paper  by 
Cuthbe.t  Daniel  [5],  dealing  with  outliers  in  factorial  experiments,  is  given 
by  William  Kruskal,  Thomas  S.  Ferguson,  John  Tukey,  and  E.G.  Gumbel  [15]  and 
stresses  the  importance  of  the  outlier  problem.) 

Most  types  of  life  data  are  such  that  a  transformation  cannot  be  made  to 
impose  normality  on  the  underlying  distribution.  Thus,  the  traditional  tests 
for  and  methods  of  treatment  of  outliers  are  inappropriate  for  most  data  arising 
from  life  tests.  A  statistic  for  testing  for  outliers  in  general  location- 
scale  families  was  recently  proposed  by  Tiku  [31]  and  shown  to  be  more  powerful 
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tha.i  various  other  statistics  under  Tiku's  [32].  p.  1418,  outlier  models 
("labelled  slippage"  models  of  Barnett  [2]  and  Bari.ett  and  Lewis  [3]), 
although  slightly  less  powerful  under  Dixon's  [6]  contamination  models;  see 
Tiku  [31,  32],  Hawkins  [14]  and  Tiku  [33],  p.  139).  The  null  distribution 
of  Tiku's  statistic  is  exactly  Beta  for  the  uniform  and  exponential  popula¬ 
tions  and  approximately  Beta  for  the  normal  population  (Tiku,  [31,  32]); 
the  percentage  points  are  not  available  for  any  other  distribution. 

In  the  study  described  in  the  sequel,  critical  values  were  generated  and 
have  been  tabulated  for  a  variation  of  Tiku's  statistic  for  a  type-I  extreme- 
value  model  (one  in  which  the  observations  are  logarithms  of  two-parameter 
Weibull  variates).  Critical  values  of  two  other  statistics,  shown  under 
certain  alternatives  to  be  superior  or  essentially  equivalent  in  terms  of 
power,  are  also  given.  Analysis  of  optimality  of  power  of  tests  is  given  in 
Section  3.3,  and  numerical  examples  are  provided  in  Section  4. 

2.  Motivation 

Often,  during  a  life  test,  an  experimenter  has  a  need  for  an  upper  confi¬ 
dence  bound  (a  prediction  interval)  for  the  time  of  the  last  (n*  )  failure  in 
a  size-r,  sample  of  test  items.  If  the  experimenter's  data  are  two-parameter 
Weibull,  Table  I  can  be  used  to  provide  such  a  prediction  interval  for  sample 
size  n  =  5(1)25,  provided  the  first  n-1,  n-2,  or  n-3  failure  times  are  known. 

On  the  basis  of  the  first  k  failure  times,  with  n-k=l.  2,  3,  one  can  also  use 
Table  1  to  obtain  an  upper  confidence  bound  for  the  time  of  the  (k+l)st  failure. 
By  use  of  an  approximation  described  in  Section  3.2,  it  is  also  possible  to 
obtain  upper  prediction  bounds  for  the  (j+l)st  failure  based  on  the  first  j 
failure  times,  with  n-j=2,  3,...,  n-2.  This  approximation  can  be  applied  for 
sample  sizes  ranging  from  3  to  as  large  as  required. 

Notwithstanding  the  usefulness  of  the  results  herein  for  obtaining  certain 
prediction  intervals,  the  primary  motivation  for  the  research  described  in  the 
following  was  precipitated  by  analysis  of  data  resulting  from  a  large  scale 
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retrosnective  longitudinal  study  of  times  of  individuals  relapsing  to  undesirable 
habitual  behavior.  Results  of  Mann  and  Rothberg  [26]  and  Mann  [21,  22]  appear 
to  indicate  that  either  a  two-parameter  Wei  bull  model  or  a  mixture  of  two- 
parameter  Weibulls  is  appropriate  for  "time-to-failure"  or  return  to  addictive 
or  other  undesirable  habitual  behavior  for  longitudinal  studies  made  on  individ¬ 
uals.  Here,  it  is  convenient  to  conceptualize  independent  intentions  to  abstain 
from  the  behavior  that  wear  out  or  otherwise  fail  in  time.  (Time-to-first 
failure  in  a  cohort  has  been  studied  in  the  case  of  prison  recidivism  by  Harris 
and  Kaylan  [13],  who  found  that  a  mixture  of  two  exponentials  provided  a  good 
fit  for  the  data.) 

What  one  is  attempting  to  determine  in  applying  an  outlier  test  to  retro¬ 
spective  longitudinal  time-to-failure  data  is  whether  or  not  "treatment  effects" 
may  have  resulted  in  specified  instances.  If  the  Weibull  outlier  test  indicates 
that  a  number  of  seemingly  inordinately  long  times  to  failure  are  significantly 
different  from  other  failure  times  of  an  individual,  then  one  can  attempt  to 
correlate  the  instances  involving  suspected  treatment  effects  with  various 
potential  causal  factors. 

Such  an  outlier  test  can  le  used,  as  well,  to  identify  treatment  effects  in 
hardware  on  the  basis  of  life-test  data.  In  such  situations,  identification 
of  an  outlier  will  potentially  allow  one  to  discover  inadvertent  and/or  unsus¬ 
pected  modifications  that  may  have  been  made  to  a  manufacturing  process.  Note 
that  the  immediate  goal  is  not  parameter  estimation,  as  in  many  situations,  and 
also  that  rather  large  numbers  of  outliers  are  a  definite  possibility. 

3.  Determination  of  Appropriate  Test 
3 . 1  Earlier  Results 

Tiku  [31]  defined  o  to  be  the  (size-n)  maximum-1 ikel ihooJ  estimator  of 
the  scale  parameter  of  a  location-scale-parameter  distribution  (i.o.,  a 
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distribution  (x )  that  is  of  the  form  G[(x-y)/o]  for  some  G).  He  defined  ac 
to  be  the  maximum-likelihood  estimator  of  a,  or  an  estimator  with  the 
asymptotic  properties  of  the  maximum  likelihood  estimator  of  a,  calculated 
from  all  the  k<n  ordered  observations  felt  not  to  be  outliers  (considered 
together  as  a  censored  size-n  sample);  i.e.,  aQ  is  consistent,  asymptoti¬ 
cally  unbiased  and  efficient  and  asymptotically  normal  for  the  cases  he 
considered  and  for  the  case  considered  here. 

Tiku  then  proposed 

T  =  h(oc/o)  (3.1.1) 

(where  h  is  a  suitable  constant)  as  a  statistic  for  testing  the  hypothesis 
that  the  sample  contains  no  outliers  versus  the  hypothesis  that  the  suspect 
observations  are  all  outliers.  He  demonstrated  empirically,  for  1,  2  and  4 
outliers,  n=10,  20  and  40,  that  the  statistic  T  has  higher  power  than 
certain  other  well  known  statistics  (see  Grubbs  [10],  Tietjen  and  Moore  [37], 
Shapiro  and  Wilk  [30]  and  Ferguson  [9])  under  Tiku's  [31,  32]  labelled  slippage 
models  (Models  A  and  B  of  Section  3.3).  Note  that  Tiku's  statistic  is  versatile; 

(i)  it  can  be  used  to  test  any  specified  number  of  outliers  on  either  side 
of  an  ordered  sample,  and  (ii)  it  can  be  used  to  test  whether  the  sample 
contains  outliers,  irrespective  of  how  many  [32],  p.  1420.  A  multivariate 
generalization  of  Tiku's  statistic  is  also  available  (Tiku  and  Singh,  [35]). 

Outliers  on  the  left  are  not  generally  of  interest  in  our  analyses. 

They  often  arise  because  inspections  of  hardware  or  tests  for  abstinence 
(such  as  urinalyses  to  test  for  opiates  and  other  drugs)  are  made  at 
discrete  time  intervals,  perhaps  weekly.  Thus,  small  values  are  relatively 
more  displaced  than  larger  values.  Because  of  the  logarithmic  transformation, 
any  displacement  of  small  values  is  magnified  as  well. 

Now,  consider  a  sample  with  a  single  large  suspected  outlier  from  a 
one-parameter  exponential  distribution  with  parameter  a.  Here  a  and  n 

V 


i  ii  mirfjfflifi 


-5- 


are  equal  to  Sn_^/(n-l)  and  Sn/n,  respectively,  where 

sj  ■  j,  x(o +  • 

X  L. 

with  X^  the  iin  exponential  order  statistic.  Thus,  for  this  distribution 
(in  which  a  is  both  a  location  and  scale  parameter),  the  statistic  T  is 
proportional  to  (n-l)cc/(na)  =  Sn_j/S  ,  which  is  equal  to 

Sn-I/[Sn-l  +  <X(n)  *  x(n-l)>]'  If  Uk  is  defi"etl  t0  be  (X( n )‘X (k) )/Sk * 
then  (n-l)ac/(no)  =  (1  +  U  , }-1  in  this  single  outlier  case. 

Lawless  [16]  proposed  the  use  of  for  obtaining  a  prediction  interval 
th 

on  X^,the  ntn  ordered  observation,  from  the  first  k  observations  in  a 
life  test  in  which  the  data  are  exponential  with  parameter  o;  and  he  demon¬ 
strated  that  for  (one-parameter)  exponential  data,  (n-1)  Un  1  is  distributed 
as  Snedecor's  F  with  2  and  2n-2  degrees  of  freedom. 

Monte  Carlo  results  exhibited  in  Table  3  demonstrate  similarly  that  for 
data  from  an  extreme-value  distribution  (data  that  are  ordered  logarithms 
(X(i  )<*‘*<X(n))  of  sample  observations  from  a  two-parameter  Wei  bull  distribution), 

the  power  of  a  test  based  on  T  is  equivalent  to  the  power  based  on  the 
ratio  of  (X^  -  and  an  estimate  equivalent  to  the  maximum  likeli¬ 

hood  estimate  of  the  extreme-value  scale  parameter  (the  Wei  bull  shape 
parameter)  obtained  from  the  first  n-1  observations. 

For  more  than  a  single  large  outlier,  the  statistic  T  defined  above  involves 
observations  that  are  not  available  in  the  prediction  interval  situation.  Hence, 
for  any  distribution,  using  a  statistic  similar  to  U^,  i.e.,  proportional 
t0  Vk  *  (XU)  - x  (k))/cc  »k  <  z  <  n  ,  for  testing  for  n-k  outliers  would 
seem  to  be  inefficient  for  n>k+l.  It  will  be  shown  in  Section  3.3  that  this 
is  not  necessarily  so. 


3.2  Test  Statistics  for  Weibull  Data 


We  consider  now  the  variate  X,  the  logarithm  of  a  Weibull  variate  with 

_  ,  x  I  1  -  exp[-exp{(x-y)/o}] ,  x>0 
^X'x  -  ' 

0  ,  otherwise;  o>0  . 

The  parameter  y  is  a  location  parameter,  the  mode  of  the  distribution 
of  X  (the  first  asymptotic  distribution  of  the  smallest  extreme)  and  is 
the  logarithm  of  the  Weibull  scale  parameter.  The  parameter  o,  which 
determines  the  shape  of  the  Weibull  distribution,  is  a  scale  parameter  of 
the  distribution  of  X,  with  n2o2/6  the  variance  of  X. 

Since  X  has  a  location-scale  parameter  distribution,  it  is  to  be 
expected  that  for  the  labelled  slippage  model  of  Tiku  (see  Section  3.3), 
an  efficient  test  statistic  for  test.ng  for  large  outliers  can  be  pro¬ 
vided  by  T  =  h(ac/o).  One  might  also  consider  statistics  proportional 
to  Q^k,  k<j,<n. 

Results  of  Lawless  [16],  Thoman,  Bain  and  Antle  [36],  and  Mann  and 
Fertig  [23],  show  that  for  Weibull  data,  maximum-likelihood  and  best  linear 
invariant  estimators  yield  very  nearly  equal  numerical  results  and  their 
small-  and  large-sample  properties  (bias,  mean  squared  error,  etc.)  are 
very  nearly  equivalent.  Thus,  for  testing  that  the  largest  n-k  of  n 
sample  observations  are  outliers,  using  T  is  essentially  equivalent  to 
using  as  a  test  statistic  o.  Jo  .  the  ratio  of  the  best  linear  invari- 
ant  estimators  of  o  based  on  the  smallest  k  and  on  all  n  sample  observa¬ 
tions,  respectively.  The  power  is  obviously  unchanged  if  one  uses 

at  ratio  of  the  best  linear  unbiased  estimators  of  o  based 

k,n  n,n 

on  the  smallest  k  and  on  all  n  sample  observations  respectively.  This 
is  true  since  best  linear  invariant  and  best  linear  unbiased  estimators 
of  a  differ  only  by  a  constant  factor.  See,  for  example,  Mann  [19]. 
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In  this  study  we  considered  specifically 


and 


V  .  =  hT'1  =5/5, 
n-k  n,n'  k,n 


Qn-k  =  (X(n)  "  X(k))/ok,n 


Wn-k  =  Q(k+l)-k“(X(k+l)  "  X(k)^/ak,n 

Note  that  Qn_k  and  WR_k  yield  gap  tests  somewhat  similar  to  some  suagested 

by  Dixon  [6],  Critical  values  of  fhese  statistics  for  testing  for  large 

outliers,  or  predicting  later  failure  times,  at  0.20,  0.10,  0.05  and  0.01 

significance  levels  for  n  =  5(1)25,  n-k  =  1,  2,  3,  are  displayed  in  Table  1, 

and  an  example  of  their  use  is  given  in  Section  4. 

The  values  shown  for  V  .  and  Q  ,,  were  generated  simultaneously  by 

n- k  n-K 

means  of  20,000  Monte  Carlo  simulations.  The  exhibited  values  of  Wn_k  were 

generated  by  making  use  of  the  fact  that,  for  k  £  n  -  2  (the  restriction 

having  been  discovered  in  this  research), 

Fk  =  tu(k.irxU))/E(x(ktl)-x(t))]/skjn/E(;k>n) 

has  approximately  a  classical  F  distribution.  This  is  discussed  in  Mann, 

Schafer,  Singpurwalla  [27],  pp.  255-256. 

In  order  to  generate  the  tabulated  values  of  W  using  the  F  approxi- 

n-K 

mation,  it  was  necessary  to  use  stored  values  of  the  expectations  of  the 

reduced  order  statistic  Y.  =  (X#.\  -  |i)A>  ,  i-k,  k+1 ,  and  of  C, 

1  *  n  \\)  K*n 

where  o/(l  +  C.  _)  is  expectation  of  5,  and  C.  o?  is  the  variance  of 
k*  n  Kin  K,n 

°M  ’  (1  +  Ck,n^k,n  • 

the  best  linear  unbiased  estimator  of  o,  based  on  the  smallest  k  observations 
of  X.  Thus, 


FkMX 


(M)  •  ■  E‘yk.n>3/[-k.n«-  *  Ck,n)3  • 


The  degrees  of  freedom  for  the  approximate  variate  are  ba^ed  on  V  ;* 


result  of  Patnaik  [24],  which  specifies  for  <*,  with  E($)  =  m,  var(sO  •  v, 


and  m2  proportional  to  v,  that  2m<{)/v  is  approximately  a  chi -squared  variate 
with  2m2/v  degrees  of  freedom.  Thus,  we  have  for  F^, 

"i  *  2  vart(Vk+l ,n‘  Vk,n)/E(Vk+1.n-  Vk.n)3  '  2  and  v2  =  2/Ck,n  de9rees  of 
freedom.  Values  of  var(Y^  p  -  Y^  n),  n-k  =  2,3;  n  =  5(1)25,  were  calculated 

from  stored  values,  along  with  the  other  constants  needed  for  the  compu¬ 
tations.  (See  below  for  the  origin  of  these  constants.) 

The  values  obtained  from  the  F  approximation  were  compared  with  trial 
simulations  having  a  Monte  Carlo  sample  size  of  20,000  to  ensure  that  the 
tabulated  values  are  sufficiently  precise.  The  agreement  increases  as 
significance  level  a  decreases.  That  is,  higher  percentile  values  are  more 
precise.  Also,  precision  increases  as  sample  size  n  increases  and  as  k 
decreases.  Examples  of  comparison  with  Monte  Carlo  values  are  shown  in 
Table  4. 

The  method  used  for  obtaining  the  F  values  with  noninteger  degrees  of 
freedom  is  described  in  Mann,  Schafer,  Singpurwalla  [27],  pp.  172.  173. 

This  method,  along  with  values  of 

E<YW.n  -  Vk,n>- 

tabulated  in  Mann,  et  al .  [27]  pp.  342-347  for  n  =  3(1)16,  and  Mann,  Server 
and  Fertig  [28],  for  n  =  3(1)25,  and  values  of  C.  ,  which  can  be  obtained 
from  values  appearing  in  Mann,  et  al .  [27],  pp.  194-207,  for  n  =  2(1)13 
and  in  Mann  [19],  for  n  =  2(1)25,  can  be  used  to  estimate  the  critical 
values  of  for  n-k>3.  In  these  cases,  one  can  use  Vj  =  2  along  with 
*  2/C^  n  for  the  degrees  of  freedom  or  can  calculate  Vj  more  precisely 
using  values  of  the  variances  and  the  covariance  of  Y^  n  and  Y^  n 
available  in  Mann  [20]. 

For  samples  larger  than  25  and  n-k>l,  one  can  use  the  approximation 

with  asymptotic  expressions  for  expectations,  variances  and  covariances  of 

the  order  statistics  available  in  Mann  et  al .  [27],  p.  218,  and  an 

asymptotic  exprer-ion  for  C.  available  in  Harter  and  Moore  [12]. 

K.n 
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As  noted  earlier,  maximum-likelihood  estimates  can  be  substituted  for 
o,  and  for  a  ,  and  the  values  in  Table  1  can  be  used  directly  with  these 

k ) n  HjH 

estimates  without  any  modification  required.  One  can  also  use  of  and 

k,  n 

o*  n,best  linear  unbiased  estimates  (see  Mann  [19])  or  simplified  linear 

estimates  (see  Mann,  et  al .  [27],  pp.  210-212,  Mann  and  Fertig  [24]  or 

Engelhardt  and  Bain  [7,8]),  in  place  of  the  best  linear  invariant  estimates. 

In  this  case,  the  modified  statistics  Q  .  and  W  .  need  to  be  multiplied 

n  *  k  n-K 

by  the  factor  CQW  =  (1  +  C.  )  and  the  modification  of  V  .  needs  to  be 

k*  n  n- k 

multiplied  by  CV  =  (1  +  C.  )/ ( 1  +  C  )  before  comparison  with  critical 

i\  >n  n,n 

values.  In  other  words,  a?  and  a*  need  to  be  divided  by  (1  +  C.  ) 

k  »n  n  *  n  k  n 

and  (1  +  C  )  respectively,  to  convert  them  to  a,  and  o  .  Values 
n,n  r  k,n  n,n 

of  the  constants  CQW  and  CV  appear  in  Table  2  for  n  =  5(1)25,  n-k  =  1  2,  3. 

Approximations  to  Qn_k  and  Vp  k  can  be  calculated  by  using  probability 
plots  such  as  those  shown  in  Figures  1  and  2.  Here,  the  inverse  of  the 
slope  of  the  line  plotted  on  the  basis  of  the  smallest  k  observations  gives 
an  approximation  to  o£  n;  and  the  inverses  of  the  slopes  of  the  line  formed 
by  the  n^*1  and  k^  points  and  by  the  line  formed  by  the  (k+l)st  and  k**1 
points  give  approximations  to  (X^  -  Xk)/E( Yn  n  -  Yk  n),  and 

(*(k*l)  •  Xk)/E(¥k4l,n  •  Vk.n>-  respectively. 

If  the  inverse  of  the  slope  in  the  probability  plots  is  used,  then 
the  constant  factor 

C1p  1  E<Yn,n  ■  YkJ<'  *  Ck,n>  «"■ 

C“P  =  E(Vk+l,n  -  Vk.n)'’  +Ck,n>* 

must  be  used  to  multiply  the  value  obtained  to  convert  it  to  one  that  can 


be  compared  with  the  critical  factors  for  Qn  .  or  W  .,  respectively.  Values 

n-K  fi“K 

of  CQP  and  CWP  are  given  in  Table  2  for  n  =  5(1)25;  n-k  =  1,  2,  3. 
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Special  probability  papers,  each  one  applicable  to  a  specified  sample 
size,  have  been  designed  (see  [18]),  so  that  individuals  without  technical 
training  can  plot  failure  times  of  interest.  Without  making  such  plots, 
one  will  usually  find  it  very  difficult  to  have  much  feeling  for  what 
might  be  moderately  large  values  for  time-to-failure  when  the  data  are 
Weibull.  Using  the  plots  with  some  minimal  instruction,  a  nontechnical 
person  should  be  able  to  determine  slopes  of  lines  formed  by  xi,...,xk  and 
by  Xj^  and  xn  or  x^  and  x^  .  This  assists  a  spouse,  a  "significant  other" 
or  a  counsellor  of  a  subject  engaging  in  undesirable  hahitoral  behavior 
to  gain  insight  into  what  might  be,  for  this  subject,  motivation  for  long¬ 
term  abstinence. 

3.3  Optimality  of  Power  Under  the  Two  Alternatives 

For  a  Weibull  model,  the  hypothesis  Hq  to  be  tested  is: 

. X^  are  order  statistics  from 

fxU)  3  l  g[(x-y)/o]  (3.3.1) 

where  f^(x)  is  the  density  function  corresponding  to  the  distribution 
function  (3,2.1).  Model  A  and  Model  B  are  given,  respectively  by 

A:  X(^v,...,X^j  are  the  smallest  k  order  statistics  from  (3.3.1)  and 

X(k+1 )”  ” ’X(n)  are  ^ar8est  n*k  order  statistics  from 


f V (x )  =  -  f { [x  -  (u  +  6d)]/ol 

A  O 


and 


X(j),...,X^  are  the  smallest  k  order  statistics  from  (3.3.1)  and 


*(k+l)”*’’*(n)  are  ^ar9est  n'*(  or<*er  statistics  from 

fx(x)  5  f0  g[(*  *  m)/*o]  . 

These  models  may  not  correspond  to  the  manner  in  which  data  are  generated 
for  the  situation  described.  Nonetheless,  a  mixture  of  any  two  specified 
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We i bull  distributions  can  be  represented  by  a  mixture  of  models  A  and 
B  if  the  "outliers"  are  larger  than  other  values  and  the  number  of 
outliers  is  only  one  or  two.  Models  A  and  B  can  be  combined  also  to 
approximate  very  well  nearly  any  model  that  is  a  mixture  of  a  Wei  bull 
sample  of  small  values  and  a  Wei  bull  sample  of  larger  values  (the 
"outliers"). 

Examples  of  Model  A  and  Model  B  are  shown  as  probability  plots  (on 
Wei  bull  probability  paper)  in  Figures  1  and  2,  respectively.  It  was  the 
object  of  the  research  described  in  this  paper  to  determine  test  statistics 
that  are  optimal,  in  terms  of  power  considerations,  for  testing  for  outliers, 
in  general,  and  for  testing  against  Model  A  or  Model  B,  or  a  mixture  of 
these,  in  particular.  To  this  end,  the  power  of  the  various  test  statistics 
under  consideration  was  calculated  by  2000  Monte  Carlo  simulations  (in 
addition  to  the  20,000  used  to  generate  critical  values  for  the  test  sta¬ 
tistics).  These  power  calculations  were  made  for  each  ct i t i cal  value 
generated  for  V  , ,  Q  ,  and  for  selected  sample  sizes  for  W  ,  for  Model  A: 

6  =  0.5,1,  Model  B:  A  =  2,5  and  mixed  models  6  =  1 ;  >  =  2,5.  Illustrative 
examples  are  exhibited  in  Table  3.  Note  that  only  for  the  test  statistic 

W  .  (under  Model  A  with  n>j0)  does  the  power  increase  as  n-k  increases, 
n- k 

This  is  probably  due  to  the  fact  that  observations  near  to  u  are  closer 
together  than  observations  near  the  tail.  Hence,  displacement  of  1  o  is 
less  critical  near  the  tail. 

On  the  basis  of  the  many  similations  that  were  made,  it  has  been  well 
established  that  when  one  is  testing  Hq  versus  a  single  outlier,  a  test 

based  or.  Q„_k  a  Qn_(n_, )  =  Vl)-k  =  "n-(n-l)  has  power  cssentia"* 
identical  to  that  of  one  based  on  V„  /  .  hT"1 .  This  was  pointed  out 


in  Section  3.1 . 
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Tiku  [27]  has  demonstrated  for  Gaussian  families  that  a  test  based 
on  the  statistic  T  has  higher  power  in  more  general  situations  (more  than 
a  single  outlier)  than  other  classical  outlier  tests  under  his  labelled 
si ippage  model .  As  the  number  of  outliers,  n-k,  increases,  however,  the 
ratio  of  the  power  of  Wn_k  relative  to  the  power  of  Vnk  increases  under 
Model  A  (shift  in  location).  That  is  to  say,  under  Model  A,  a  test  based 
on  a  measure  of  the  gap  (X^k+ij  -  X^)  between  the  smallest  suspected 
outlier  and  the  largest  observation  thought  not  to  be  an  outlier,  relative 
to  a  measure  of  the  dispersion  (a^  n)  of  the  observations  thought  not  to 
be  outliers  is  more  powerful  than  one  based  on  T  (see  Table  3).  It  is 
clear  from  Figure  1  that  for  Model  A,  it  is  essentially  this  quantity, 
i.e.,  the  size  of  the  gap  relative  to  the  dispersion  of  the  smaller  obser¬ 
vations,  that  is  the  critical  factor  in  establishing  the  suspicion  of 
outliers.  Thus,  it  is  not  unlikely  that  a  test  based  on  a  statistic,  such 
as  Wn_k  involving  X^+1)  -•  X^,  is  optimal  for  alternative  models  resem¬ 
bling  Model  A. 

If  it  were  established  that  Model  A  was  precisely  the  alternative 

(which  it  usually  will  not  be),  then  using  in  the  denominator  of  W  . 

n- k 

an  estimator  of  a  that  involves  all  differences  of  successive  order  statistics 


except  Xk+1  -  Xk  would  be  more  powerful  than  Wn  k  as  it  is  defined.  Such 
a  test  would  be  equivalent  in  terms  of  power,  to  one  having  this  statistic 
in  the  denominator  and  on>n  in  the  numerator  and  should  be  optimal  for  the 
labelled  slippage  model  with  Model  A  as  the  alternative.  Note  that  Mann 
and  Fertig  [25]  demonstrate  that  for  a  goodness-of-fit  test,  involving 
gaps  (which  all  estimates  of  o  in  location-scale  families  involve)  the 
Important  consideration  in  determining  optimality  in  which  gaps  are  in¬ 
volved  in  the  test  and  in  what  position,  rather  than  how  the  gaps  are 
combined.  That  is  say,  an  optimal  estimator  of  o  based  on  the  first 
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k-1  gaps  performs  no  better  than  one  which  is  the  sum  of  each  of  the 

k-1  gaps  divided  by  its  expectation. 

In  this  context  it  is  noteo  that  the  statistic  V*  =  n/ok  n  for 

k  =  n-2,  n-3,***,  has  the  same  functional  relationship  with  W  ,  that 

r  n-k 

V  /  ,<  has  with  Q  ,  Therefore,  the  statistic  W„  .  also  has 

essentially  the  same  power  as  V*.  The  inverse  of  V*  is  a  special  case 
of  Z,  a  statistic  proposed  by  Tiku  [34,  eq.  1.4]  for  testing  goodness 
of  fit  when  H0  is  exponential i ty.  The  statistic  Z  is  equivalent  to 
V*  when  the  exponential  censored  sample  of  size  n  consists  only  of  the 
smallest  k+1  observations.  Thus,  Z  stresses  the  difference  of  the  two 
largest  observed  order  statistics. 

It  should  also  be  pointed  out  (see  [24])  that  y*  and  y  based  on 


,  k<n,  from  an  extreme-value  distribution  are  of  the 


approximate  form,  +  ca^  n>  where  c  is  an  appropriate  constant, 
a  test  of  form  n)/«k  n  is  essentially  the  test  Wn_k> 


Thus, 


For  Model  B,  the  critical  factor  is  the  ratio  of  the  slopes  of  the 


plots  of  the  smallest  k  and  the  largest  n- k+1  observations.  For  this 

model,  W  ,  performs  poorly  relative  to  V  .  ,  as  one  might  suspect,  but 
h-k  n-K 

Qn_k>  which  is  proportional  to  the  ratio  of  estimates  of  these  two  slopes, 
approximates  V  ^  very  well,  i.e.,  powers  of  Vn_k  and  Qn_k  are  very  nearly 
equivalent.  See  Table  3.  Thus  statistic  Qn  k  is  shown  (in  Table  3)  to 
perform  very  poorly,  in  terms  of  power,  under  Model  A,  however. 

For  a  mixture  of  the  models,  results  shown  in  Table  3  indicate  that  a 


test  based  on  Wnk  tends  to  be  most  powerful,  with  k  performing  most 
poorly.  Again  as  with  Model  A,  the  gap  X^+-|^  -  relative  to  n 
appears  to  be  the  most  critical  factor. 
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It  seems  clear  from  this  study  that  in  considering  whether  or  not  to 

test  for  outliers,  one  should,  if  possible,  plot  the  data  on  probability 

paper.  Plotting  is  useful  in  providing  perspective  even  though  there  is 

a  single  suspected  outlier.  For  more  than  a  single  outlier,  plotting  is 

essential  if  one  is  to  know  wheti er  to  use  W  ,  (for  Model  A  or  mixed 

n-k 

models)  or  either  or  Qn  ^  (■'or  Model  B)  or  ^  (for  a  more  general 
alternative  model).  In  this  way  no  one  can  insure  using  a  test  with  what 
appears  to  be  optimal  power. 

Clearly,  the  power  of  the  outlier  test  is  affected  by  the  apriori 
analysis,  as  is  always  the  case  to  some  extent  in  looking  at  the  data 
before  performing  an  outlier  test.  However,  in  this  context  it  is  impor¬ 
tant  to  identify  large  outliers  in  order  to  determine  if  treatment  effects 
(extending  life  or  for  human  subjects,  extending  periods  of  abstinence) 
have  resulted  and  what  might  have  caused  such  effects.  The  goal  is  not 
primarily  one  of  estimation  of  parameters,  but  rather  of  exploration. 

This  point  is  discussed  by  Barnett  and  Lewis  [3],  pp.  5-6. 

Finally,  it  is  to  be  noted  that  the  results  obtained  here  are  likely 
to  extend  to  other  location-scale  families.  Thus,  an  analog  of  Wp  ^  in¬ 
volving  the  gap  will  possibly  tend  to  be  more  powerful  for 

any  location-scale  family  (including  Gaussian  distributions)  for  testing 
Hq  under  Model  A  than  is  the  statistic  T. 

4.  Numerical  Examples 

The  data  in  the  probability  plots  (Figures  1  and  2)  are  used  here  to  pro¬ 
vide  examples  of  the  use  of  the  various  test  statistics. 

First,  we  consider  Figure  1,  which  exhibits  two  possible  outliers  from  a 
mixed  model  with  a  <  1.  Here  n  is  equal  tc  9,  so  that  tables  in  [22]  can  be 
used  to  obtain  the  weights  to  calculate  079=  0.709  and  g  -  0.884.  Also 
X(g)  -  X|7j  =  0.872  and  x^  -  x^  =  0.693.  Thus,  Vg_7  =  1.245,  qg  7  =  1 . 228 
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and  w9  7  =  0.976.  Comparing  these  values  with  the  tabulated  critical  values, 
one  finds  that  if  the  specified  significance  level  is  0.10,  then  only  the 
test  statistic  w g  7,  involving  -  x^,  rejects  the  hypothesis  of  no 
outliers . 

The  plotted  line  drawn  (by  hand)  in  Figure  1  gives  highest  weight  to  the 
kth,  or  in  this  case,  the  seventh  value,  as  do  the  weights  for  optimal  linear 
estimates  of  o,  such  as  o  and  a*.  Also  note  that  horizontal,  rather  than 
vertical,  distances  from  points  should  be  minimized.  The  slope  of  the  line 
is  about  1.20  so  that  an  approximation  to  ay  g  is  about  0.833.  This  gives 
0.717  as  an  approximation  to  5y  g  with  the  use  of  CQW  =  1.161  (found  in 
Table  2)  as  a  divisor. 

The  plot  in  Figure  2  suggests  3  large  outliers  of  the  general  type 
specified  by  Model  B.  Thus,  using  tabulated  values  in  [14],  one  finds 
cj*I1  —  0.84226,  1^  **  1.353,  x^-|^ j  ~  ^(11)  ~  and  -  ^(11)  =  ^*^0 

so  that  Vi4_n  =  1.595,  q^^  =  2.457  and  w^_n  =  0.660.  In  comparing 
these  values  with  the  critical  values  of  Table  1.  one  finds  that  if  the 
specified  significance  level  if  0.10,  all  three  test  statistics  reject  a 
"no  outliers"  hypothesis.  The  statistics,  v^  ^  and  q^  ^  reject  also  at 
the  0.05  significance  level,  while  w^  ^  does  not.  This  is  to  be  expected 
since  the  probability  plot  demonstrates  that  the  appropriate  test  statistic 
Is  v14_n  or  qH  ii  . 

The  slope  of  the  line  plotted  in  Figure  2  is  about  1.1,  giving  an  approxi¬ 
mation  of  about  0.91/CQW  =  0.91/1.076  =  0.845  for  u  =  0.842.  Note  that 

i,  L. 

again,  the  ku1  value  has  been  weighted  most  heavily. 
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